CLARIN and Free Open Source Finite-State Tools
نویسندگان
چکیده
CLARIN stands for Common Language Resources and Technologies Research Infrastructure and it is one of the 35 infrastructure projects listed in the ESFRI roadmap of European research infrastructures for various areas. CLARIN has now entered its 3 year preparatory phase under a grant from the EU Commission. The preparatory phase of CLARIN has 32 partner organizations, (see www.clarin.eu for more details). There are quite a number of language resources around in Europe consisting of text and speech corpora with possible annotations, lexical materials, standards and norms and programs for parsing and processing of such data. These resources are fragmented in several ways, and it is difficult:
منابع مشابه
Finite-State Spell-Checking with Weighted Language and Error Models—Building and Evaluating Spell-Checkers with Wikipedia as Corpus
In this paper we present simple methods for construction and evaluation of finite-state spell-checking tools using an existing finite-state lexical automaton, freely available finite-state tools and Internet corpora acquired from projects such as Wikipedia. As an example, we use a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology to...
متن کاملPorting Basque Morphological Grammars to foma, an Open-Source Tool
Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the ...
متن کاملOpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language
Finite-state methods are well established in language and speech processing. OpenFst (available from www.openfst.org) is a free and open-source software library for building and using finite automata, in particular, weighted finite-state transducers (FSTs). This tutorial is an introduction to weighted finitestate transducers and their uses in speech and language processing. While there are othe...
متن کاملUsing HFST for Creating Computational Linguistic Applications
HFST – Helsinki Finite-State Technology (hfst.sf.net) is a framework for compiling and applying linguistic descriptions with finitestate methods. HFST currently collects some of the most important finite-state tools for creating morphologies and spellcheckers into one open-source platform and supports extending and improving the descriptions with weights to accommodate the modeling of statistic...
متن کاملHFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers
Morphological analysis of a wide range of languages can be implemented efficiently using finite-state transducer technologies. Over the last 30 years, a number of attempts have been made to create tools for computational morphologies. The two main competing approaches have been parallel vs. cascaded rule application. The parallel rule application was originally introduced by Koskenniemi [1983] ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008